Search CORE

434 research outputs found

Automatic summarising: factors and directions

Author: Jones Karen Sparck
Publication venue
Publication date: 01/01/1998
Field of study

This position paper suggests that progress with automatic summarising demands a better research methodology and a carefully focussed research strategy. In order to develop effective procedures it is necessary to identify and respond to the context factors, i.e. input, purpose, and output factors, that bear on summarising and its evaluation. The paper analyses and illustrates these factors and their implications for evaluation. It then argues that this analysis, together with the state of the art and the intrinsic difficulty of summarising, imply a nearer-term strategy concentrating on shallow, but not surface, text analysis and on indicative summarising. This is illustrated with current work, from which a potentially productive research programme can be developed

arXiv.org e-Print Archive

CiteSeerX

The role of artificial intelligence in information retrieval

Author: Karen Sparck Jones
Publication venue: 'Wiley'
Publication date: 01/01/2004
Field of study

Crossref

Recommended from our members

Ontology Based Query Expansion with a Probabilistic Retrieval Model

Author: J. Bhogal
K. Sparck-Jones
S.E. Robertson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

This paper examines the use of ontologies for defining query context. The information retrieval system used is based on the probabilistic retrieval model. We extend the use of relevance feedback (RFB) and pseudo-relevance feedback (PF) query expansion techniques using information from a news domain ontology. The aim is to assess the impact of the ontology on the query expansion results with respect to recall and precision. We also tested the results for varying the relevance feedback parameters (number of terms or number of documents). The factors which influence the success of ontology based query expansion are outlined. Our findings show that ontology based query expansion has had mixed success. The use of the ontology has vastly increased the number of relevant documents retrieved, however, we conclude that for both types of query expansion, the PF results are better than the RFB results

City Research Online

Crossref

Birmingham City University Open Access Repository

BCU Open Access

A Latent Dirichlet Framework for Relevance Modeling

Author: A. Sparck-Jones
M. Steyvers
V. Lavrenko
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

Abstract. Relevance-based language models operate by estimating the probabilities of observing words in documents relevant (or pseudo relevant) to a topic. However, these models assume that if a document is relevant to a topic, then all tokens in the document are relevant to that topic. This could limit model robustness and effectiveness. In this study, we propose a Latent Dirichlet relevance model, which relaxes this assumption. Our approach derives from current research on Latent Dirichlet Allocation (LDA) topic models. LDA has been extensively explored, especially for generating a set of topics from a corpus. A key attraction is that in LDA a document may be about several topics. LDA itself, however, has a limitation that is also addressed in our work. Topics generated by LDA from a corpus are synthetic, i.e., they do not necessarily correspond to topics identified by humans for the same corpus. In contrast, our model explicitly considers the relevance relationships between documents and given topics (queries). Thus unlike standard LDA, our model is directly applicable to goals such as relevance feedback for query modification and text classification, where topics (classes and queries) are provided upfront. Thus although the focus of our paper is on improving relevance-based language models, in effect our approach bridges relevance-based language models and LDA addressing limitations of both. Finally, we propose an idea that takes advantage of “bagof-words” assumption to reduce the complexity of Gibbs sampling based learning algorithm

CiteSeerX

Crossref

A Design Engineering Approach for Quantitatively Exploring Context-Aware Sentence Retrieval for Nonspeaking Individuals with Motor Disabilities

Author: Arnott John L
Garcia Luís
Jones K Sparck
Jones K Sparck
Manning Christopher D.
Nandi Arnab
Polacek Ondrej
Vertanen Keith
Publication venue: Conference on Human Factors in Computing Systems - Proceedings
Publication date: 01/01/2020
Field of study

Nonspeaking individuals with motor disabilities typically have very low communication rates. This paper proposes a design engineering approach for quantitatively exploring contextaware sentence retrieval as a promising complementary input interface, working in tandem with a word-prediction keyboard. We motivate the need for complementary design engineering methodology in the design of augmentative and alternative communication and explain how such methods can be used to gain additional design insights. We then study the theoretical performance envelopes of a context-aware sentence retrieval system, identifying potential keystroke savings as a function of the parameters of the subsystems, such as the accuracy of the underlying auto-complete word prediction algorithm and the accuracy of sensed context information under varying assumptions. We find that context-aware sentence retrieval has the potential to provide users with considerable improvements in keystroke savings under reasonable parameter assumptions of the underlying subsystems. This highlights how complementary design engineering methods can reveal additional insights into design for augmentative and alternative communication

Crossref

Apollo (Cambridge)

University of Dundee Online Publications

CUED - Cambridge University Engineering Department

Unbiased Comparative Evaluation of Ranking Functions

Author: Owen A. B.
Pavlu V.
Peng Ye D. D.
Sparck-Jones K.
Voorhees E. M.
Yuan C.
Zhao P.
Publication venue
Publication date: 25/04/2016
Field of study

Eliciting relevance judgments for ranking evaluation is labor-intensive and costly, motivating careful selection of which documents to judge. Unlike traditional approaches that make this selection deterministically, probabilistic sampling has shown intriguing promise since it enables the design of estimators that are provably unbiased even when reusing data with missing judgments. In this paper, we first unify and extend these sampling approaches by viewing the evaluation problem as a Monte Carlo estimation task that applies to a large number of common IR metrics. Drawing on the theoretical clarity that this view offers, we tackle three practical evaluation scenarios: comparing two systems, comparing

k

systems against a baseline, and ranking

k

systems. For each scenario, we derive an estimator and a variance-optimizing sampling distribution while retaining the strengths of sampling-based evaluation, including unbiasedness, reusability despite missing data, and ease of use in practice. In addition to the theoretical contribution, we empirically evaluate our methods against previously used sampling heuristics and find that they generally cut the number of required relevance judgments at least in half.Comment: Under review; 10 page

arXiv.org e-Print Archive

Crossref

Combining global and local semantic contexts for improving biomedical information retrieval

Author: J. Gobeill
K. Sparck Jones
N. Stokes
S. Abdou
S.E. Robertson
Z. Lu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Présenté lors de l'European Conference on Information Retrieval 2011International audienceIn the context of biomedical information retrieval (IR), this paper explores the relationship between the document's global context and the query's local context in an attempt to overcome the term mismatch problem between the user query and documents in the collection. Most solutions to this problem have been focused on expanding the query by discovering its context, either \textit{global} or \textit{local}. In a global strategy, all documents in the collection are used to examine word occurrences and relationships in the corpus as a whole, and use this information to expand the original query. In a local strategy, the top-ranked documents retrieved for a given query are examined to determine terms for query expansion. We propose to combine the document's global context and the query's local context in an attempt to increase the term overlap between the user query and documents in the collection via document expansion (DE) and query expansion (QE). The DE technique is based on a statistical method (IR-based) to extract the most appropriate concepts (global context) from each document. The QE technique is based on a blind feedback approach using the top-ranked documents (local context) obtained in the first retrieval stage. A comparative experiment on the TREC 2004 Genomics collection demonstrates that the combination of the document's global context and the query's local context shows a significant improvement over the baseline. The MAP is significantly raised from 0.4097 to 0.4532 with a significant improvement rate of +10.62\% over the baseline. The IR performance of the combined method in terms of MAP is also superior to official runs participated in TREC 2004 Genomics and is comparable to the performance of the best run (0.4075)

CiteSeerX

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Interaction: Beyond retrieval

Author: Bates
Dourish
Fast
Furnas
Furnas
Kirsh
Mackinlay
Marchionini
Mirel
Russell
Sellen
Sparck Jones
Publication venue: 'Wiley'
Publication date: 01/01/2006
Field of study

No Abstract.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/57316/1/14504301125_ftp.pd

Crossref

Deep Blue Documents at the University of Michigan